Pattern-based English-Latvian Toponym Translation

نویسندگان

  • Tatiana Gornostay
  • Inguna Skadina
چکیده

Due to their linguistic and extra-linguistic nature toponyms deserve a special treatment when they are translated. The paper deals with issues related to automated translation of toponyms from English into Latvian. Translation process allows us to translate not only toponyms from a dictionary, but out-of-vocabulary toponyms as well. Translation of out-of-vocabulary toponyms is divided into three steps: source string normalization, translation, and target string normalization. Translation step implies application of translation strategies and linguistic toponym translation patterns. 10,000 UK-related toponyms from Geonames were used as a development set. The developed methods have been evaluated on a test set: the accuracy of translation is 67% for the whole test set, 58% for oneword toponymic units, and 81% for multiword toponyms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Latvian Toponym Processing: Translation Strategies and Linguistic Patterns

The paper presents a study of a challenging task in machine translation and crosslanguage information retrieval – translation of toponyms. Due to their linguistic and extra-linguistic nature, toponyms deserve a special treatment. The overall translation process includes two stages of processing: dictionary-based and out-ofvocabulary toponym translation. The latter is divided into three steps: s...

متن کامل

Improving SMT with Morphology Knowledge for Baltic Languages

In the recent years, several machine translation systems have been built for the Baltic languages. Besides Google and Microsoft machine translation engines and research experiments with statistical MT for Latvian [1] and Lithuanian, there are both English-Latvian [2] and English-Lithuanian [3] rulebased MT systems available. Both Latvian and Lithuanian are morphologically rich languages with qu...

متن کامل

Multi-word Expressions in English-Latvian Machine Translation

The paper presents series of experiments that aim to find best method how to treat multi-word expressions (MWE) in machine translation task. Methods have been investigated in a framework of statistical machine translation (SMT) for translation form English into Latvian. MWE candidates have been extracted using pattern-based and statistical approaches. Different techniques for MWE integration in...

متن کامل

English-Latvian SMT: knowledge or data?

In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...

متن کامل

Multi-system machine translation using online APIs for English-Latvian

This paper describes a hybrid machine translation (HMT) system that employs several online MT system application program interfaces (APIs) forming a MultiSystem Machine Translation (MSMT) approach. The goal is to improve the automated translation of English – Latvian texts over each of the individual MT APIs. The selection of the best hypothesis translation is done by calculating the perplexity...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009